AITopics | feature fusion

Collaborating Authors

feature fusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

Neural Information Processing SystemsMar-22-2026, 16:39:14 GMT

Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder architectures for modal interaction and query reasoning. However, their performance significantly drops when dealing with complex textual expressions. This is because the former paradigm only utilizes limited downstream data to fit the multi-modal feature fusion. Therefore, it is only effective when the textual expressions are relatively simple. In contrast, given the wide diversity of textual expressions and the uniqueness of downstream training data, the existing fusion module, which extracts multimodal content from a visual-linguistic context, has not been fully investigated.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.64)

Add feedback

Attention in Convolutional LSTM for Gesture Recognition

Neural Information Processing SystemsMar-16-2026, 19:00:50 GMT

Convolutional long short-term memory (LSTM) networks have been widely used for action/gesture recognition, and different attention mechanisms have also been embedded into the LSTM or the convolutional LSTM (ConvLSTM) networks. Based on the previous gesture recognition architectures which combine the three-dimensional convolution neural network (3DCNN) and ConvLSTM, this paper explores the effects of attention mechanism in ConvLSTM. Several variants of ConvLSTM are evaluated: (a) Removing the convolutional structures of the three gates in ConvLSTM, (b) Applying the attention mechanism on the input of ConvLSTM, (c) Reconstructing the input and (d) output gates respectively with the modified channel-wise attention mechanism. The evaluation results demonstrate that the spatial convolutions in the three gates scarcely contribute to the spatiotemporal feature fusion, and the attention mechanisms embedded into the input and output gates cannot improve the feature fusion. In other words, ConvLSTM mainly contributes to the temporal fusion along with the recurrent steps to learn the long-term spatiotemporal features, when taking as input the spatial or spatiotemporal features. On this basis, a new variant of LSTM is derived, in which the convolutional structures are only embedded into the input-to-state transition of LSTM. The code of the LSTM variants is publicly available.

artificial intelligence, deep learning, machine learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ea35a58ee3da13c01a69df2a819386b3-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 15:08:02 GMT

face recognition, intra-set relationship, recognition, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan > Ingham County > Lansing (0.05)
North America > United States > Michigan > Ingham County > East Lansing (0.05)
Asia > Middle East > Jordan (0.04)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.50)

Add feedback

SwinTrack: ASimpleandStrongBaselinefor TransformerTracking

Neural Information Processing SystemsFeb-9-2026, 14:37:04 GMT

WeexpectSwinTracktoserveasa solid baseline for Transformer tracking and facilitate future research.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Learning Cooperative Trajectory Representations for Motion Forecasting

Neural Information Processing SystemsFeb-8-2026, 15:17:10 GMT

Motion forecasting is an essential task for autonomous driving, and utilizing information from infrastructure and other vehicles can enhance forecasting capabilities.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.89)
Transportation > Ground > Road (0.49)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

TransformerFusion: MonocularRGBScene ReconstructionusingTransformers

Neural Information Processing SystemsFeb-7-2026, 10:44:02 GMT

We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

Neural Information Processing SystemsDec-24-2025, 23:41:42 GMT

At a cocktail party, humans exhibit an impressive ability to direct their attention. The auditory attention detection (AAD) approach seeks to identify the attended speaker by analyzing brain signals, such as EEG signals. However, current AAD algorithms overlook the spatial distribution information within EEG signals and lack the ability to capture long-range latent dependencies, limiting the model's ability to decode brain activity.To address these issues, this paper proposes a dual attention refinement network with spatiotemporal construction for AAD, named DARNet, which consists of the spatiotemporal construction module, dual attention refinement module, and feature fusion \& classifier module. Specifically, the spatiotemporal construction module aims to construct more expressive spatiotemporal feature representations, by capturing the spatial distribution characteristics of EEG signals. The dual attention refinement module aims to extract different levels of temporal patterns in EEG signals and enhance the model's ability to capture long-range latent dependencies. The feature fusion \& classifier module aims to aggregate temporal patterns and dependencies from different levels and obtain the final classification results.The experimental results indicate that DARNet achieved excellent classification performance, particularly under short decision windows. While maintaining excellent classification performance, DARNet significantly reduces the number of required parameters. Compared to the state-of-the-art models, DARNet reduces the parameter count by 91\%.

artificial intelligence, dual attention refinement network, machine learning, (10 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

EfficientECG: Cross-Attention with Feature Fusion for Efficient Electrocardiogram Classification

Deng, Hanhui, Li, Xinglin, Luo, Jie, Wu, Di

arXiv.org Artificial IntelligenceDec-9-2025

Electrocardiogram is a useful diagnostic signal that can detect cardiac abnormalities by measuring the electrical activity generated by the heart. Due to its rapid, non-invasive, and richly informative characteristics, ECG has many emerging applications. In this paper, we study novel deep learning technologies to effectively manage and analyse ECG data, with the aim of building a diagnostic model, accurately and quickly, that can substantially reduce the burden on medical workers. Unlike the existing ECG models that exhibit a high misdiagnosis rate, our deep learning approaches can automatically extract the features of ECG data through end-to-end training. Specifically, we first devise EfficientECG, an accurate and lightweight classification model for ECG analysis based on the existing EfficientNet model, which can effectively handle high-frequency long-sequence ECG data with various leading types. On top of that, we next propose a cross-attention-based feature fusion model of EfficientECG for analysing multi-lead ECG data with multiple features (e.g., gender and age). Our evaluations on representative ECG datasets validate the superiority of our model against state-of-the-art works in terms of high precision, multi-feature fusion, and lightweights.

artificial intelligence, ecg data, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.03804

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network

Li, Yuanzhe, Müller, Steffen

arXiv.org Artificial IntelligenceNov-26-2025

Pedestrian crossing intention prediction is essential for the deplo y-ment of autonomous vehicles (AVs) in urban environments. Ideal prediction provides AVs with critical environmental cues, thereby reducing the risk of pedestrian -related collisions. However, the prediction task is challenging due to the diverse nature of pedestrian behavior and its dependence on multiple contextual factors. T his paper proposes a multimodal fusion network that leverages seven modality features from both visual and motion branches, aiming to effectively extract and integrate complementary cues across different modalities. Specifically, motion and visual features are extracted from the raw inputs using multiple Transformer -based extraction modules. D epth -guided attention module leverages depth information to guide attention towards salient regions in another modality through comprehensive spatial feature interactions. To account for the varying importance of different modalities an d frames, m odality attention and temporal attention are designed to selectively emphasize informative modalities and effectively capture temporal dependencies. Extensive experiments on the JAAD dataset validate the effectiveness of the proposed network, achieving superior performance compared to the basel ine methods.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2511.20008

Country: Europe > Switzerland > Basel-City > Basel (0.24)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.73)
Transportation > Ground > Road (0.73)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Filters

Collaborating Authors

feature fusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0a87257e5308197df43230edf4ad1dae-Paper.pdf

SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

Attention in Convolutional LSTM for Gesture Recognition

ea35a58ee3da13c01a69df2a819386b3-Paper-Conference.pdf

SwinTrack: ASimpleandStrongBaselinefor TransformerTracking

Learning Cooperative Trajectory Representations for Motion Forecasting

TransformerFusion: MonocularRGBScene ReconstructionusingTransformers

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection

EfficientECG: Cross-Attention with Feature Fusion for Efficient Electrocardiogram Classification

Pedestrian Crossing Intention Prediction Using Multimodal Fusion Network